Building an Automatic Thesaurus to Enhance Information Retrieval

نویسنده

  • Essam Hanandeh
چکیده

One of the major problems of modern Information Retrieval (IR) systems is the vocabulary Problem that concerns with the discrepancies between terms used for describing documents and the terms used by the researcher to describe their information need. We have implemented an automatic thesurs, the system was built using Vector Space Model (VSM). In this model, we used Cosine measure similarity. In this paper we use selected 242 Arabic abstract documents. All these abstracts involve computer science and information system. The main goal of this paper is to design and build automatic Arabic thesauri using term-term similarity that can be used in any special field or domain to improve the expansion process and to get more relevance documents for the user's query. The study concluded that the similarl thesaurus improved the recall and precision more than traditional information retrieval system in terms of recall and precision level.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic Thai Ontology Construction and Maintenance System

Ontology is an essential resource to enhance the performance of Information Processing system such as information integration, document classification in taxonomies, including information retrieval and data cleaning in database system. This paper proposes three methodologies for Automatic Thai Ontology Construction and Maintenance from technical corpus, dictionary and thesaurus. For corpus base...

متن کامل

Construction of a Condensed Thesaurus for Building Radiology Ontology

The building of thesauri for large domains, especially for medicine, is a costly affair. However, in many domains thesauri can be constructed on an ontological basis [Wielinga , Schreiber, 2001]. We are developing an ontological information retrieval system for the retrieving of medical records from an electronic medical record system (EMR). We decided to use the UMLS as a basis for building th...

متن کامل

Bilingual Indexing for Information Retrieval with AUTINDEX

AUTINDEX is a bilingual automatic indexing system for the two languages German and English. It is being developed within the EU-funded BINDEX project. The aim of the system is to automatically index large quantities of abstracts of scientific and technical papers from several areas of engineering. Automatic indexing takes place using a controlled vocabulary provided in monolingual and bilingual...

متن کامل

Viii-1 Viii. an Experiment in Automatic Thesaurus Construction

A method is presented for the automatic construction of thesauruses used in information retrieval systems. The construction algorithm is based on the concept-concept associations displayed in a sample document collection.

متن کامل

A Conceptual Framework For Automatic And Dynamic Thesaurus Updating In Information Retrieval Systems

This paper aims at presenting a methodology for automatic thesaurus construction in order to help the search of documents and we want to obtain the development of classes for specific topics (for a given corpus) without a priori semantic information. Information contained in the thesaurus lead to new search formulations via automatic and/or user feedback. This presentation even being theoretica...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013